C: Behaviour of arrays when assigned to pointers - c

#include <stdio.h>
main()
{
char * ptr;
ptr = "hello";
printf("%p %s" ,"hello",ptr );
getchar();
}
Hi, I am trying to understand clearly how can arrays get assign in to pointers. I notice when you assign an array of chars to a pointer of chars ptr="hello"; the array decays to the pointer, but in this case I am assigning a char of arrays that are not inside a variable and not a variable containing them ", does this way of assignment take a memory address specially for "Hello" (what obviously is happening) , and is it possible to modify the value of each element in "Hello" wich are contained in the memory address where this array is stored. As a comparison, is it fine for me to assign a pointer with an array for example of ints something as vague as thisint_ptr = 5,3,4,3; and the values 5,3,4,3 get located in a memory address as "Hello" did. And if not why is it possible only with strings? Thanks in advanced.

"hello" is a string literal. It is a nameless non-modifiable object of type char [6]. It is an array, and it behaves the same way any other array does. The fact that it is nameless does not really change anything. You can use it with [] operator for example, as in "hello"[3] and so on. Just like any other array, it can and will decay to pointer in most contexts.
You cannot modify the contents of a string literal because it is non-modifiable by definition. It can be physically stored in read-only memory. It can overlap other string literals, if they contain common sub-sequences of characters.
Similar functionality exists for other array types through compound literal syntax
int *p = (int []) { 1, 2, 3, 4, 5 };
In this case the right-hand side is a nameless object of type int [5], which decays to int * pointer. Compound literals are modifiable though, meaning that you can do p[3] = 8 and thus replace 4 with 8.
You can also use compound literal syntax with char arrays and do
char *p = (char []) { "hello" };
In this case the right-hand side is a modifiable nameless object of type char [6].

The first thing you should do is read section 6 of the comp.lang.c FAQ.
The string literal "hello" is an expression of type char[6] (5 characters for "hello" plus one for the terminating '\0'). It refers to an anonymous array object with static storage duration, initialized at program startup to contain those 6 character values.
In most contexts, an expression of array type is implicitly converted a pointer to the first element of the array; the exceptions are:
When it's the argument of sizeof (sizeof "hello" yields 6, not the size of a pointer);
When it's the argument of _Alignof (a new feature in C11);
When it's the argument of unary & (&arr yields the address of the entire array, not of its first element; same memory location, different type); and
When it's a string literal in an initializer used to initialize an array object (char s[6] = "hello"; copies the whole array, not just a pointer).
None of these exceptions apply to your code:
char *ptr;
ptr = "hello";
So the expression "hello" is converted to ("decays" to) a pointer to the first element ('h') of that anonymous array object I mentioned above.
So *ptr == 'h', and you can advance ptr through memory to access the other characters: 'e', 'l', 'l', 'o', and '\0'. This is what printf() does when you give it a "%s" format.
That anonymous array object, associated with the string literal, is read-only, but not const. What that means is that any attempt to modify that array, or any of its elements, has undefined behavior (because the standard explicitly says so) -- but the compiler won't necessarily warn you about it. (C++ makes string literals const; doing the same thing in C would have broken existing code that was written before const was added to the language.) So no, you can't modify the elements of "hello" -- or at least you shouldn't try. And to make the compiler warn you if you try, you should declare the pointer as const:
const char *ptr; /* pointer to const char, not const pointer to char */
ptr = "hello";
(gcc has an option, -Wwrite-strings, that causes it to treat string literals as const. This will cause it to warn about some C code that's legal as far as the standard is concerned, but such code should probably be modified to use const.)

#include <stdio.h>
main()
{
char * ptr;
ptr = "hello";
//instead of above tow lines you can write char *ptr = "hello"
printf("%p %s" ,"hello",ptr );
getchar();
}
Here you have assigned string literal "hello" to ptr it means string literal is stored in read only memory so you can't modify it. If you declare char ptr[] = "hello";, then you can modify the array.

Say what?
Your code allocates 6 bytes of memory and initializes it with the values 'h', 'e', 'l', 'l', 'o', and '\0'.
It then allocates a pointer (number of bytes for the pointer depends on implementation) and sets the pointer's value to the start of the 5 bytes mentioned previously.
You can modify the values of an array using syntax such as ptr[1] = 'a'.
Syntactically, strings are a special case. Since C doesn't have a specific string type to speak of, it does offer some shortcuts to declaring them and such. But you can easily create the same type of structure as you did for a string using int, even if the syntax must be a bit different.

Related

Pointer assignment to different types

We can assign a string in C as follows:
char *string;
string = "Hello";
printf("%s\n", string); // string
printf("%p\n", string); // memory-address
And a number can be done as follows:
int num = 4404;
int *nump = &num;
printf("%d\n", *nump);
printf("%p\n", nump);
My question then is, why can't we assign a pointer to a number in C just like we do with strings? For example, doing:
int *num;
num = 4404;
// and the rest...
What makes a string fundamentally different than other primitive types? I'm quite new to C so any explanation as to the difference between the two would be very helpful.
There is no such type as "string" in C. A string is not a primitive type. A string is just an array of characters, terminated by a NUL byte ('\0').
When you do this:
char *string;
string = "Hello";
What really happens is that the compiler is smart and creates a constant read only char array and then assigns it to your variable string. This can be done because in C the name of an array is the same as the pointer to its first element.
// This is placed in a different section:
const char hidden_arr[] = {'H', 'e', 'l', 'l', 'o', '\0'};
char *string;
string = hidden_arr;
// Same as:
string = &(hidden_arr[0]);
Here, hidden_arr and string are both char *, because as we just said the name of an array is equal to the pointer to its first element. Of course, all of this is done transparently, you will not actually see another variable named hidden_arr, that's just an example. In reality the string will be stored in some location in your executable without a name, and the address of that location will be copied to your string pointer.
When you try to do the same with an integer, it's wrong because int * and int are different types, and you cannot write this (well, you can, but it's meaningless and does not do what you expect it to):
int *ptr;
ptr = 123;
But, you can very well do it with an array of integers:
int arr[] = {1, 2, 3};
int *ptr;
ptr = arr;
// Same as:
ptr = &(arr[0]);
why can't we assign a pointer to a number in C just like we do with strings?
int *num;
num = 4404;
Code can do that if 4404 is a valid address for an int.
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.
C11dr §6.3.2.3 5
If the address is not properly aligned --> undefined behavior (UB).
If the address is a trap --> undefined behavior (UB).
Attempting to de-reference the pointer is a problem unless it points to a valid int.
printf("%d\n", *num);
With below, "Hello" is a string literal. It exist someplace. The assignment take the address of the string literal and assigns that to string.
char *string;
string = "Hello";
The point is that that address assigned is known to be valid for a char *.
In the num = 4404; is not known to be valid (it likely is not).
What makes a string fundamentally different than other primitive types?
In C, a string is a C library specification, not a C language one. It is definition convenient to explaining various function therein.
A string is a contiguous sequence of characters terminated by and including the first null character §7.1.1 1
Primitive types are part of the C language.
The languages also has string literals like "Hello" in char *string; string = "Hello";. These have some similarity to strings, yet differ.
I recommend searching for "ISO/IEC9899:2017" to find a draft copy of the current C spec. It will answer many of your 10 question of the last week.
What makes a string fundamentally different than other primitive types?
A string seems like a primitive type in C because the compiler understands "foo" and generates a null-terminated character array: ['f', 'o', 'o', '\0']. But a C string is still just that: an array of characters.
My question then is, why can't we assign a pointer to a number in C just like we do with strings?
You certainly can assign a pointer to a number, it's just that a number isn't a pointer, whereas the value of an array is the address of the array. If you had an array of int, then that would work just like a string. Compare your code:
char *string;
string = "Hello";
printf("%s\n", string); // string
printf("%p\n", string); // memory-address
to the analogous code for an array of integers:
int numbers[] = {1, 2, 3, 4, 5, 0};
int *nump = numbers;
printf("%d\n", nump[0]); // string
printf("%p\n", nump); // memory-address
The only real difference is that the compiler has some extra syntax for arrays of characters because they're so common, and printf() similarly has a format specifier just for character arrays for the same reason.
The type pf a string literal (e.g. "hello world") is a char[]. Where assigning char *string = "Hello" means that string now points to the start of the array (e.g. the address of the first memory address in the array: &char[0]).
Whereas you can't assign an integer to a pointer because their types are different, one is a int the other is a pointer int *. You could cast it to the correct type:
int *num;
num = (int *) 4404;
But this would be considered quite dangerous (unless you really know what you are doing). I.e. do you know what is a memory adress 4404?

Why pointers can't be used to index arrays? [duplicate]

This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 3 years ago.
I am trying to change value of character array components using a pointer. But I am not able to do so. Is there a fundamental difference between declaring arrays using the two different methods i.e. char A[] and char *A?
I tried accessing arrays using A[0] and it worked. But I am not able to change values of the array components.
{
char *A = "ab";
printf("%c\n", A[0]); //Works. I am able to access A[0]
A[0] = 'c'; //Segmentation fault. I am not able to edit A[0]
printf("%c\n", A[0]);
}
Expected output:
a
c
Actual output:
a
Segmentation fault
The difference is that char A[] defines an array and char * does not.
The most important thing to remember is that arrays are not pointers.
In this declaration:
char *A = "ab";
the string literal "ab" creates an anonymous array object of type char[3] (2 plus 1 for the terminating '\0'). The declaration creates a pointer called A and initializes it to point to the initial character of that array.
The array object created by a string literal has static storage duration (meaning that it exists through the entire execution of your program) and does not allow you to modify it. (Strictly speaking an attempt to modify it has undefined behavior.) It really should be const char[3] rather than char[3], but for historical reasons it's not defined as const. You should use a pointer to const to refer to it:
const char *A = "ab";
so that the compiler will catch any attempts to modify the array.
In this declaration:
char A[] = "ab";
the string literal does the same thing, but the array object A is initialized with a copy of the contents of that array. The array A is modifiable because you didn't define it with const -- and because it's an array object you created, rather than one implicitly created by a string literal, you can modify it.
An array indexing expression, like A[0] actually requires a pointer as one if its operands (and an integer as the other). Very often that pointer will be the result of an array expression "decaying" to a pointer, but it can also be just a pointer -- as long as that pointer points to an element of an array object.
The relationship between arrays and pointers in C is complicated, and there's a lot of misinformation out there. I recommend reading section 6 of the comp.lang.c FAQ.
You can use either an array name or a pointer to refer to elements of an array object. You ran into a problem with an array object that's read-only. For example:
#include <stdio.h>
int main(void) {
char array_object[] = "ab"; /* array_object is writable */
char *ptr = array_object; /* or &array_object[0] */
printf("array_object[0] = '%c'\n", array_object[0]);
printf("ptr[0] = '%c'\n", ptr[0]);
}
Output:
array_object[0] = 'a'
ptr[0] = 'a'
String literals like "ab" are supposed to be immutable, like any other literal (you can't alter the value of a numeric literal like 1 or 3.1419, for example). Unlike numeric literals, however, string literals require some kind of storage to be materialized. Some implementations (such as the one you're using, apparently) store string literals in read-only memory, so attempting to change the contents of the literal will lead to a segfault.
The language definition leaves the behavior undefined - it may work as expected, it may crash outright, or it may do something else.
String literals are not meant to be overwritten, think of them as read-only. It is undefined behavior to overwrite the string and your computer chose to crash the program as a result. You can use an array instead to modify the string.
char A[3] = "ab";
A[0] = 'c';
Is there a fundamental difference between declaring arrays using the two different methods i.e. char A[] and char *A?
Yes, because the second one is not an array but a pointer.
The type of "ab" is char /*readonly*/ [3]. It is an array with immutable content. So when you want a pointer to that string literal, you should use a pointer to char const:
char const *foo = "ab";
That keeps you from altering the literal by accident. If you however want to use the string literal to initialize an array:
char foo[] = "ab"; // the size of the array is determined by the initializer
// here: 3 - the characters 'a', 'b' and '\0'
The elements of that array can then be modified.
Array-indexing btw is nothing more but syntactic sugar:
foo[bar]; /* is the same as */ *(foo + bar);
That's why one can do funny things like
"Hello!"[2]; /* 'l' but also */ 2["Hello!"]; // 'l'

How can a character pointer without any address specification hold data?

The following C program is not supposed to work by my understanding of pointers but it does.
#include<stdio.h>
main() {
char *p;
p = "abcdefghijk";
printf("%s", p);
}
Outputs:
abcdefghijk
The char pointer variable p is pointing to something random as I have not assigned any address to it like p = &i; where i is some char array.
That means if I try to write anything to the memory address held by the pointer p it should give me segmentation fault since it is some random address not assigned to my program by the OS.
But the program compiles and runs successfully. What is happening?
In this expression statement
p="abcdefghijk";
the pointer p is assigned with the address of the first character of the string literal "abcdefghijk" that the compiler stores as a zero-terminated character array in the static memory area.
Thus in this statement there are two things that happen. At first the compiler creates an unnamed character array with the static storage duration to hold the string literal. Then the address of the first character of the array is assigned to the pointer. You can imagine it the following way
char unnamed[] = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', '\0' };
p = unnamed;
or
p = &unnamed[0];
Take into account that though string literals in C have types of non-constant character arrays opposite to C++ where they have types of constant character arrays nevertheless you may not change string literals. Any attempt to change a string literal results in undefined behavior.
So this code snippet is invalid
char *p = "abcdefghijk";
p[0] = 'A';
But you could create your own character array initializing it with the string literal and in this case you can change the array. For example
char s[] = "abcdefghijk";
char *p = s;
p[0] = 'A';
From the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
Pay attention to this part of the quote
It is unspecified whether these arrays are distinct provided their
elements have the appropriate values.
It means that for example if you will write
char *p = "abcdefghijk";
char *q = "abcdefghijk";
then it is not necessary that this expression yields true (integer value 1)
p == q
and the result depends on compiler options whether the same string literals are stored as one array or as distinct arrays.
In C a string literal like "abcdefghijk" is actually stored as an (read-only) array of characters. The assignment makes p point to the first character of that array.
I note that you mention p = &i where i would be an array. That is in most cases wrong. Arrays naturally decays to pointers to their first element. I.e. doing p = i would be equal to p = &i[0].
While both &i and &i[0] would result in the same address, it is semantically very different. Lets take an example:
char array[10];
With the above definition doing &array[0] (or just plain array as explained just above) you get a pointer to char, i.e. char *. When doing &array you get a pointer to an array of ten characters, i.e. char (*)[10]. The two types are very different.
"abcdefghijk" is a string constant, and p="abcdefghijk"; will give to p adress of this string.
So it's normal that printf("%s",p); display this string without error.
p="abcdefghijk";
You are creating a string literal in code segment and assigning the address of first character of the literal to the pointer, and as the pointer is not constant you can assign it again with different addresses.
The string literal "abcdefghijk" is compiled by putting the characters in a block in the program's datatext segment. Then your assignment of it to the pointer assigns the address of its location in the data segment to the pointer.

Something I don't get about C strings

A few questions regarding C strings:
both char* and char[] are pointers?
I've learned about pointers and I can tell that char* is a pointer, but why is it automatically a string and not just a char pointer that points to 1 char; why can it hold strings?
Why, unlike other pointers, when you assign a new value to the char* pointer you are actually allocating new space in memory to store the new value and, unlike other pointers, you just replace the value stored in the memory address the pointer is pointing at?
A pointer is not a string.
A string is a constant object having type array of char and, also, it has the property that the last element of the array is the null character '\0' which, in turn, is an int value (converted to char type) having the integer value 0.
char* is a pointer, but char[] is not. The type char[] is not a "real" type, but an incomplete type. The C language is specified in such a way that, in the moment that you define a concrete variable (object) having array of char type, the size of the array is well determined in some way or another. Thus, none variable has type char[] because this is not a type (for a given object).
However, automatically every object having type array of N objects of type char is promoted to char *, that is, a pointer to char pointing to the initial object of the array.
On the other hand, this promotion is not always performed. For example, the operator sizeof() will give different results for char* than for an array of N chars. In the former case, the size of a pointer to char is given (which is in general the same amount for every pointer...), and in the last case gives you the value N, that is, the size of the array.
The behaviour is differente when you declare function arguments as char* and char[]. Since the function cannot know the size of the array, you can think of both declarations as equivalent.
Actually, you are right here: char * is a pointer to just 1 character object. However, it can be used to access strings, as I will explain you now: In the paragraph 1. I showed you that the strings are considered objects in memory having type array of N chars for some N. This value N is big enough to allow an ending null character (as all "string" is supposed to be in C).
So, what's the deal here?
The key point to understand this issues is the concept of object (in memory).
When you have a string or, more generally, an array of char, this means that you have figured out some manner to hold an array object in memory.
This object determines a portion of RAM memory that you can access safely, because C has assigned enough memory for it.
Thus, when you point to the first byte of this object with a char* variable, actually you have guaranteed access to all the adjacent elements to the "right" of that memory place, because those places are well defined by C as having the bytes of the array above.
Briefly: the adjacent (to the right) bytes of the byte pointed by a char* variable can be accessed, they are valid places to access, so the pointer can be "iterated" to walk through these bytes, up to the end of the string, without "risks", since all the bytes in an array are contiguous well defined positions in memory.
This is a complicated question, but it reveals that you are not understanding the relationship between pointers, arrays, and string literals in C.
A pointer is just a variable pointing to a position in memory.
A pòinter to char points to just 1 object having type char.
If the adjacent bytes of the pointed position correspond to an array of chars, they will be accessible by the pointer, so the pointer can "walk on" the memory bytes occupied by the array object.
A string literal is considered as an array of char object, which implictely add an ending byte with value 0 (the null character).
In any case, an array of T object has a well defined "size".
A string literal has an additional property: it's a constant object.
Try to fit and gather these concepts in your mind to figure out what's going on.
And ask me for clarification.
ADDITIONAL REMARKS:
Consider the following piece of code:
#include <stdio.h>
int main(void)
{
char *s1 = "not modifiable";
char s2[] = "modifiable";
printf("%s ---- %s\n\n", s1, s2);
printf("Size of array s2: %d\n\n", (int)sizeof(s2));
s2[1] = '0', s2[3] = s2[5] = '1', s2[4] = '7',
s2[6] = '4', s2[7] = '8', s2[9] = '3';
printf("New value of s2: %s\n\n",s2);
//s1[0] = 'X'; // Attempting to modify s1
}
In the definition and initialization of s1 we have the string literal "not modifiable", which has constant content and constant address. Its address is assigned to the pointer s1 as initialization.
Any attempt to modify the bytes of the string will give some kind of error, because the array content is read-only.
In the definition and initialization of s2, we have the string literal "modifiable", which has, again, constant content and constant address. However, what happens now is that, as part of the initialization, the content of the string is copied to the array of char s2. The size of the array s2 is not specified (the declaration char s2[] gives an incomplete type), but after initialization the size of the array is well determined and defined as the exact size of the copied string (plus 1 character used to hold the null character, or end-of-string mark).
So, the string literal "modifiable" is used to initialize the bytes of the array s2, which is modifiable.
The right manner to do that is by changing a character at the time.
For more handy ways of modifying and assigning strings, it has to be used the standard header <string.h>.
char *s is a pointer, char s[] is an array of characters. Ex.
char *s = "hello";
char c[] = "world";
s = c; //Legal
c = address of some other string //Illegal
char *s is not a string; it points to an address. Ex
char c[] = "hello";
char *s = &c[3];
Assigning a pointer is not creating memory; you are pointing to memory. Ex.
char *s = "hello";
In this example when you type "hello" you are creating special memory to hold the string "hello" but that has nothing to do with the pointer, the pointer simply points to that spot.

Array syntax and pointers in C

Is my understanding of arrays in C correct?
Arrays are nothing more than a syntactic convenience such that, for instance, when you declare in your C code an array:
type my_array[x];
the compiler sees it as something equivalent to:
type *my_array = malloc(sizeof(*my_array) * x);
with a free system call that releases my_array once we leave the scope of my_array.
Once my_array is declared
my_array[y];
is nothing more but:
*(my_array + y)
Transposing this to character strings; I was also wondering what was happening behind the curtain with
char *my_string = "Hello"
and
my_string = "Hello"
No, an array object is an array object. C has some odd rules that make it appear that arrays and pointers are the same thing, or at least very similar, but they very definitely are not.
This declaration:
int my_array[100];
creates an array object; the object's size is 100 * sizeof (int). It does not create a pointer object.
There is no malloc(), even implicitly. Storage for my_array is allocated the same way as storage for any object declared in the same scope.
What may be confusing you is that, in most but not all contexts, an expression of array type is implicitly converted to a pointer to the array's first element. (This gives you a pointer value; there's still no pointer object.) This conversion doesn't happen if the array expression is the operand of a unary & or sizeof. &my_array gives you the address of the array, not of some nonexistent pointer obejct. sizeof my_array is the size of the entire array (100 * sizeof (int)`), not the size of a pointer.
Also, if you define a function parameter with an array type:
void func(int param[]) { ... }
it's adjusted at compile time to a pointer:
void func(int *param) { ... }
This isn't a conversion; in that context (and only in that context), int param[] really means int *param.
Also, array indexing:
my_array[3] = 42;
is defined in terms of pointer arithmetic -- which means that the prefix my_array has to be converted to a pointer before you can index into it.
The most important thing to remember is this: Arrays are not pointer. Pointers are not arrays.
Section 6 of the comp.lang.c FAQ explains all this very well.
Once my_array is declared
my_array[y];
is nothing more but :
*(my_array + y)
Yes, because my_array is converted to a pointer, and the [] operator is defined so that x[y] means *(x+y).
Transposing this to character strings; i was also wondering what was
happening behind the curtain with
char *my_string = "Hello"
and
my_string = "Hello"
"Hello" is a string literal. It's an expression of type char[6], referring to an anonymous statically allocated array object. If it appears on the RHS of an assignment or initializer, it's converted, like any array expression, to a pointer. The first line initializes my_string so it points to the first character of "Hello". The second is a pointer assignment that does the same thing.
So what about this?
char str[] = "Hello";
This is the third context in which array-to-pointer conversion doesn't happen. str takes its size from the size of the string literal, and the array is copied to str. It's the same as:
char str[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
No!
type array[n] is a variable stored on stack
type *array is a pointer variable stored on the stack too. But after array = malloc(sizeof(*array) * n); it'll point to some data on the heap
If it walks like a duck, swims like a duck and flies like a duck, then it is a duck.
So, let's see. Arrays and pointers have some common attributes as you correctly described, however, you can see there are some differences. Read more here.

Resources