Returning strings in C function, explanation needed

Returning strings in C function, explanation needed - c

I understand that in order to return a string from a function I have to return a pointer. What I don't understand is why a char array is treated somewhat different from, say, integer, and gets destroyed when you exit the function. That's probably because I come from a high-level languages world, but this seems to be equally valid to me:
int x = 1;
return x;
char x[] = "hello";
return x;

The reason is both simple, yet subtle: C functions cannot be declared to return arrays.
When you return a pointer, like this:
char *foo(void)
{
char x[] = "hello";
return x;
}
The return x; is not actually returning x. It is returning a pointer to the first element of x1 - it is exactly the same as saying:
return &x[0];
It should be more clear why this isn't correct - it is exactly analagous to this:
int *foo(void)
{
int x = 100;
return &x;
}
It is, however, possible to return structures from functions - so you can return an array as long as it wrapped inside a struct. The following is quite OK:
struct xyzzy {
char x[10];
};
struct xyzzy foo(void)
{
struct xyzzy x = { "hello" };
return x;
}
1. This is a consequence of a special case rule for array types. In an expression, if an array isn't the subject of either the unary & or sizeof operators, it evaluates to a pointer to its first element. Trying to pin down actual an array in C is a bit like trying to catch fog in your hands - it just disappears, replaced by a pointer.

An integer is a primitive type, so it's value is returned as such. However, in C, a string is not a primitive type, which is why you declare it as an array of characters. And when you return an array, you are actually returning a pointer to the original array, so it is a kind of return-by-reference, if you will. But as you can see, the pointer to char x[] is invalid once you exit the function.

Please see this detailed answer I have written about this. Not alone that, in C, variables that are created within the scope of functions are using an automatic stack heap where the variables will resides in and when the function returns, that heap is destroyed, this is where the usage of local variables especially of type string i.e. char [] or a pointer to string must be allocated on the heap outside of the function scope otherwise garbage will be returned. The workaround on that is to use a static buffer or use malloc.

That char array is allocated within the scope of the function (rather than dynamically allocated eg by malloc()), so it will be unavailable outside of that scope.
Two ways around it:
Declare that char array to be static
Allocate the buffer in the calling function, passing it as an argument, then this function just fills it.

Related

Char and double pointers declaration

I´m starting with C programing language and I have a doubt with the initialization of pointers. I know that when we declare a pointer we are reserving memory for the specific pointer, but this pointer may point to x address we don´t know anything about, the point is that I can do this:
char * s = "Hi"; but I can´t do this double * f = 3; so what I do to "initialize" this double pointer is the next piece of code:
double * d;
double x = 3;
d = &x;
So that I have the pointer pointing to an specific memory location with the initial value I want to have.
Summing up, my doubt is:
1.- Is there any way to initialize double/int pointers as with char ones?

Yes, but you usually should not:
double *x = (double []) {3};
Just as char *x = "Hi"; creates an array of char and initializes x to point to the first member of the array, the code above creates an array of double and initializes x to point its first member. (I have shown only a single element, but you could write more.) This is called a compound literal.
An important difference is that the lifetime of a string literal is the entire execution of the program, so you can safely return its address from a function. In contrast, a compound literal inside a function exists only as long as the function is executing, so its address cannot be returned from the function. If this definition appeared outside a function, the compound literal would exist for the entire execution of the program.
Compound literals are useful at times, but the situations where it is good to initialize a pointer to the address of a compound literal as I show above are very limited. Code like that should be used with restraint and discretion.

The best you could do would be to use an array instead of a raw pointer. Since the array degrades to a pointer in almost every use case, it will behave roughly the way you want.
double d[] = {3};
will allocate the space (globally or on the stack, depending on whether it's performed at global or function scope) and initialize it at once. For almost all purposes, referring to d will cause it to degrade to a double*.
The main difference is that, for stack allocated arrays (function scope, without static qualifier), the array only lives until the end of the function call. Declaring it as static double d[] = {3}; will make it live for the life of the program, but it will have a single initialization, and any changes will persist indefinitely.
The other important distinction is that you can't modify the address, since it's not really a pointer; ++d won't be legal, since d is an array, not a pointer.
For an exact equivalent of char *s = "Hi";, you're basically stuck with a two-liner. That code is roughly equivalent to:
static char unnamed[] = {'H', 'i', '\0'};
char *s = unnamed;
It's a special behavior of C string literals, which no other type has direct language support for. Mimicking it precisely with double would always be a two liner:
static double dstorage[] = {3};
double *d = dstorage;

Returning array declared in function returns address of local variable in C?

I realize that variables declared in a function call are pushed onto the stack and once the function reaches its end the local variables declared onto the stack are popped off and go into lala land.
The thing I don't undertstand is that if I declare a pointer in a function I can return the pointer with no compiler complaint, whereas with an array it does give me a warning.
Here is an example of what I'm referring to:
char * getStringArray();
char * getStringPointer();
int main(){
char * temp1 = getStringPointer();
char * temp2 = getStringArray();
return 0;
}
char * getStringPointer(){
char * retString = "Fred";
return retString;
}
char * getStringArray(){
char retString[5] = {'F', 'r','e','d','\0'};
return retString;
}
During compilation it throws a "returning address of local variable" warning about getStringArray(). What confuses me is that I've been told that referencing an array solely by its name(like retString, no[])refers to its address in memory, acting like a pointer.
Arrays can be handy and there are many times I would like to use an array in a function and return it, is there any quick way around this?
As I referred to in a comment, will this work? I'm guessing malloc allocates to heap but so its fine. I'm still a little unclear of the difference between static data and data on the heap.
char * getStringPointer(){
char * retString = (char *)malloc(sizeof(char)*4+1);
*(retString+0) = 'F';
*(retString+1) = 'r';
*(retString+2) = 'e';
*(retString+3) = 'd';
*(retString+4) = '\0';
return retString;
}

getStringPointer is allocating a pointer on the stack, then returning that pointer, which points to the string 'Fred\0' somewhere in your executable (not on the stack).
getStringArray allocates space for 5 chars on the stack, assigns them 'F' 'r' 'e' 'd' and '\0', then returns a pointer to the address of 'F', which is on the stack (and therefore invalid after the function returns).
There are two ways round it: either you can malloc some space for your array on the heap, or you can make a struct that contains an appropriately sized array and return that. You can return numerical, pointer and struct types from functions, but not arrays.

"Fred" is an unnamed global, so it's not going away, and returning a pointer to it is safe. But getStringArray() is de-allocating everything that pointer points to. That's the difference.

The array is unambiguously a pointer to stack-local memory so the compiler is certain that's what you are leaking, so that's why you get the warning.
Just make the array static and both the warning and the lifetime restriction will vanish.
Now, the pointer could be pointing to something static or in the heap (in your case it actually was static) and so it's not obvious from looking at just that one expression whether you are leaking something stack-local or not. That's why you don't get the warning when you return a pointer, regardless of whether it's a safe return or not. Again, the static storage class will save you. Your string constant happens to be static, so that works fine.

By adding a static modifier to the variable retString and returning back a pointer will resolve the issue.
char * getStringArray(){
static char retString[5] = {'F', 'r','e','d','\0'};
return retString;
}
That will return the appropriate value.
Notice how its still treated as a pointer, because the first element of the array is decayed into a pointer of type T, in this case, char. This will make the compiler happy as it matches the prototype declaration of the function itself as defined at the start of the source.
See ideone sample

Why can't we assign int* x=12 or int* x= "12" when we can assign char* x= "hello"?

What is the correct way to use int* x?
Mention any related link if possible as I was unable to find one.

Because the literal "hello" evaluates to a pointer to constant memory initialised with the string "hello" (and a nul terminator), i.e. the value you get is of char* type.
If you want a pointer to number 12 then you'll need to store the value 12 somewhere, e.g. in another int, and then take a pointer to that:
int x_value = 12;
int* x = &x_value;
However in this case you're putting the 12 on the stack, and so that pointer will become invalid once you leave this function.
You can at a pinch abuse that mechanism to make yourself a pointer to 12; depending on endianness that would probably be
int* x = (int*)("\x0c\x00\x00");
Note that this is making assumptions about your host's endianness and size of int, and that you would not be able to modify that 12 either (but you can change x to point to something else), so this is a bad idea in general.

Because the compiler creates a static (constant) string "hello" and lets x point to that, where it doesn't create a static (constant) int.

A string literal creates an array object. This object has static storage duration (meaning it exists for the entire execution of the program), and is initialized with the characters in the string literal.
The value of a string literal is the value of the array. In most contexts, there is an implicit conversion from char[N] to char*, so you get a pointer to the initial (0th) element of the array. So this:
char *s = "hello";
initializes s to point to the initial 'h' in the implicitly created array object. A pointer can only point to an object; it does not point to a value. (Incidentally, that really should be const char *s, so you don't accidentally attempt to modify the string.)
String literals are a special case. An integer literal does not create an object; it merely yields a value. This:
int *ptr = 42; // INVALID
is invalid, because there is no implicit conversion of 42 from int* to int. This:
int *ptr = &42; // INVALID
is also invalid, because the & (address-of) operator can only be applied to an object (an "lvalue"), and there is no object for it to apply to.
There are several ways around this; which one you should use depends on what you're trying to do. You can allocate an object:
int *ptr = malloc(sizeof *ptr); // allocation an int object
if (ptr == NULL) { /* handle the error */ }
but a heap allocation can always fail, and you need to deallocate it when you're finished with it to avoid a memory leak. You can just declare an object:
int obj = 42;
int *ptr = &obj;
You just have to be careful with the object's lifetime. If obj is a local variable, you can end up with a dangling pointer. Or, in C99 and later, you can use a compound literal:
int *ptr = &(int){42};
(int){42} is a compound literal, which is similar in some ways to a string literal. In particular, it does create an object, and you can take that object's address.
But unlike with string literals, the lifetime of the (anonymous) object created by a compound literal depends on the context in which it appears. If it's inside a function definition, the lifetime is automatic, meaning that it ceases to exist when you leave the block containing it -- just like an ordinary local variable.
That answers the question in your title. The body of your question:
What is the correct way to use int* x?
is much more general, and it's not a question we can answer here. There are a multitude of ways to use pointers correctly -- and even more ways to use them incorrectly. Get a good book or tutorial on C and read the section that discusses pointers. Unfortunately there are also a lot of bad books and tutorials. Question 18.10 of the comp.lang.c FAQ is a good starting point. (Bad tutorials can often be identified by the casual use of void main(), and by the false assertion that arrays are really pointers.)

Q1. Why can't we assign int *x=12? You can provided that 12 is a valid memory address which holds an int. But with a modern OS specifying a hard memory address is completely wrong (perhaps except embedded code). The usage is typically like this
int y = 42; // simple var
int *x = &y; // address-of: x is pointer to y
*x = 12; // write a new value to y
This looks the same as what you asked, but it is not, because your original declaration assigns the value 12 to x the pointer itself, not to *x its target.
Q2. Why can't we assign int *x = "12"? Because you are trying to assign an incompatible type - a char pointer to int pointer. "12" is a string literal which is accessed via a pointer.
Q3. But we can assign char* x= "hello"
Putting Q1 and Q2 together, "hello" generates a pointer which is assigned to the correct type char*.

Here is how it is done properly:
#include <stdio.h>
#include <stdlib.h>
int main() {
int *x;
x = malloc(sizeof(int));
*x = 8;
printf("%d \n", *x);
}

Declare and initialize pointer concisely (i. e. pointer to int)

Given pointers to char, one can do the following:
char *s = "data";
As far as I understand, a pointer variable is declared here, memory is allocated for both variable and data, the latter is filled with data\0 and the variable in question is set to point to the first byte of it (i. e. variable contains an address that can be dereferenced). That's short and compact.
Given pointers to int, for example, one can do this:
int *i;
*i = 42;
or that:
int i = 42;
foo(&i); // prefix every time to get a pointer
bar(&i);
baz(&i);
or that:
int i = 42;
int *p = &i;
That's somewhat tautological. It's small and tolerable with one usage of a single variable. It's not with multiple uses of several variables, though, producing code clutter.
Are there any ways to write the same thing dry and concisely? What are they?
Are there any broader-scope approaches to programming, that allow to avoid the issue entirely? May be I should not use pointers at all (joke) or something?

String literals are a corner case : they trigger the creation of the literal in static memory, and its access as a char array. Note that the following doesn't compile, despite 42 being an int literal, because it is not implicitly allocated :
int *p = &42;
In all other cases, you are responsible of allocating the pointed object, be it in automatic or dynamic memory.
int i = 42;
int *p = &i;
Here i is an automatic variable, and p points to it.
int * i;
*i = 42;
You just invoked Undefined Behaviour. i has not been initialized, and is therefore pointing somewhere at random in memory. Then you assigned 42 to this random location, with unpredictable consequences. Bad.
int *i = malloc(sizeof *i);
Here i is initialized to point to a dynamically-allocated block of memory. Don't forget to free(i) once you're done with it.
int i = 42, *p = &i;
And here is how you create an automatic variable and a pointer to it as a one-liner. i is the variable, p points to it.
Edit : seems like you really want that variable to be implicitly and anonymously allocated. Well, here's how you can do it :
int *p = &(int){42};
This thingy is a compound literal. They are anonymous instances with automatic storage duration (or static at file scope), and only exist in C90 and further (but not C++ !). As opposed to string literals, compound literals are mutable, i.e you can modify *p.
Edit 2 : Adding this solution inspired from another answer (which unfortunately provided a wrong explanation) for completeness :
int i[] = {42};
This will allocate a one-element mutable array with automatic storage duration. The name of the array, while not a pointer itself, will decay to a pointer as needed.
Note however that sizeof i will return the "wrong" result, that is the actual size of the array (1 * sizeof(int)) instead of the size of a pointer (sizeof(int*)). That should however rarely be an issue.

int i=42;
int *ptr = &i;
this is equivalent to writing
int i=42;
int *ptr;
ptr=&i;
Tough this is definitely confusing, but during function calls its quite useful as:
void function1()
{
int i=42;
function2(&i);
}
function2(int *ptr)
{
printf("%d",*ptr); //outputs 42
}
here, we can easily use this confusing notation to declare and initialize the pointer during function calls. We don't need to declare pointer globally, and the initialize it during function calls. We have a notation to do both at same time.
int *ptr; //declares the pointer but does not initialize it
//so, ptr points to some random memory location
*ptr=42; //you gave a value to this random memory location
Though this will compile, but it will invoke undefined behaviour as you actually never initialized the pointer.
Also,
char *ptr;
char str[6]="hello";
ptr=str;
EDIT: as pointed in the comments, these two cases are not equivalent.
But pointer points to "hello" in both cases. This example is written just to show that we can initialize pointers in both these ways (to point to hello), but definitely both are different in many aspects.
char *ptr;
ptr="hello";
As, name of string, str is actually a pointer to the 0th element of string, i.e. 'h'.
The same goes with any array arr[], where arr contains the address of 0th element.

you can also think it as array , int i[1]={42} where i is a pointer to int

int * i;
*i = 42;
will invoke undefined behavior. You are modifying an unknown memory location. You need to initialize pointer i first.
int i = 42;
int *p = &i;
is the correct way. Now p is pointing to i and you can modify the variable pointed to by p.
Are there any ways to write the same thing dry and concisely?
No. As there is no pass by reference in C you have to use pointers when you want to modify the passed variable in a function.
Are there any broader-scope approaches to programming, that allow to avoid the issue entirely? May be I should not use pointers at all (joke) or something?
If you are learning C then you can't avoid pointers and you should learn to use it properly.

Stack memory fundamentals

Consider this code:
char* foo(int myNum) {
char* StrArray[5] = {"TEST","ABC","XYZ","AA","BB"};
return StrArray[4];
}
When I return StrArray[4] to the caller, is this supposed to work?
Since the array is defined on the stack, when the caller gets the pointer, that part of memory has gone out of scope. Or will this code work?

This code will work. You are returning the value of the pointer in StrArray[4], which points to a constant string "BB". Constant strings have a lifetime equal to that of your entire program.
The important thing is the lifetime of what the pointer points to, not where the pointer is stored. For example, the following similar code would not work:
char* foo(int myNum) {
char bb[3] = "BB";
char* StrArray[5] = {"TEST","ABC","XYZ","AA",bb};
return StrArray[4];
}
This is because the bb array is a temporary value on the stack of the foo() function, and disappears when you return.

Beware: you're lying to the compiler.
Each element of StrArray points to a read-only char *;
You're telling the compiler the return value of your function is a modifiable char *.
Lie to the compiler and it will get its revenge sooner or later.
In C, the way to declare a pointer to read-only data is to qualify it with const.
I'd write your code as:
const char* foo(int myNum) {
const char* StrArray[5] = {"TEST","ABC","XYZ","AA","BB"};
return StrArray[4];
}

The code will work. The point you are returning (StrArray[4]) points to a string literal "BB". String literals in C are anonymous array objects with static storage duration, which means that they live as long as your program lives (i.e forever). It doesn't matter where you create that sting literal. Even if it is introduced inside a function, it still has static storage duration.
Just remember, that string literals are not modifiable, so it is better to use const char* pointers with string literals.

C uses arrays with indicies beginning at 0. So the first element is StrArray[0]
Thus StrArray[5] was not declared in your code.
C will allow you to write code to return StrArray[5] but wht happens is undefined and will differ on OS and compiler but often will crash the program.

Only the array of pointers is on the stack, not the string-literal-constants.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Returning strings in C function, explanation needed - c

Related

Char and double pointers declaration

Returning array declared in function returns address of local variable in C?

Why can't we assign int* x=12 or int* x= "12" when we can assign char* x= "hello"?

Declare and initialize pointer concisely (i. e. pointer to int)

Stack memory fundamentals

Categories

Resources